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1. Introduction 

1.1. Self -bounding functions 

The concept of self-bounding functions first appeared in Boucheron, Lugosi and Massart 
(2000). In Boucheron, Lugosi and Massart (2003), the authors introduce the versatile en- 
tropy method, which allows to get exponential concentration inequalities with good rates for 
many kinds of functions of independent random variables. In Boucheron, Lugosi and Massart 
(2009) the authors define the weakly self - bounding functions, and show that Talagrand's 
convex distance inequality is implied by their inequalities. For an elementary introduction 
and examples, see the lecture notes Lugosi (2005). 

Let X\, . . . , X n be random variables taking values in Ai, . . . , A n Polish spaces. Denote the 
random vector X := (X%, . . . , X n ), and let A := Ai x A2 x . . . x A n . 

For a vector x with n coordinates, let X-i := (xi, . . . , Xi-i, Xi + %, . . . , x n ) be the vector 
created by dropping coordinate i. Let A_j := Ai x . . . x A$_i x A i+1 x . . . x A n . 

Similarly, X', X(k), X(k)', etc. will be random vectors defined on a common probability 
space (Q, J-q,P), taking values in A. The law of the random vector X is denoted by u, 
so (A, J 7 , u) is the probability space induced by the random vector X. Thus for S G J 7 , 
fi(S) = F(X e S). 

Let g : A — > M + be a non-negative function, then we will be interested in the concentration 
properties of g(X). We will denote its centered version by f(x) := g(x) — K(g(X)). 

In this paper, for each i < n, gt denotes a measurable function from A_j to M. 

The following definitions of self-bounding functions are from Boucheron, Lugosi and Massart 
(2009) (we made a slight generalization, they supposed that A x = . . . = A n = X). 

Definition 1. A function g : A — >■ R is called (a,b)-self-bounding if for some a, b > 0, for 
all i = 1, . . . ,n and all x G A, 

1. < g(x) - gi(x-i) < 1, and 
2 - YJi=i{9{%) ~ 9i(x-i)) < ag(x) + b. 
A function g : A — > R is called weakly (a, b) -self-bounding if for all x G A, 

n 

^2 (g(x) - gi(x^)) 2 < ag(x) + b. 
1=1 

Remark 1.1. If g is (a,b)-self-bounding, then it is also (a,b)-weakly-self-bounding. If g is 
(a, b) -self-bounding for some gi, then it is also (a,b)-self-bounding for 

gi(x-i) := inf f(x ly . . . ,x i - 1 ,x' i ,x i+1 , . . . ,x n ). (1.1) 

If g is weakly (a,b)-self-bounding, then in this paper we will also assume that gi(x_i) < g{x) 
for all x G A, and in this case, we can choose gi as in (1.1). In the rest of this paper, we 
assume that gi is chosen as (1.1). 
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We define a-self-bounding functions (the same way as in Paulin (2012)): 

Definition 2. A function g : A — >• R is called a-(a,b)-self-bounding for a,b > if there 
exists a(x) : A — > WJ, a(x) = (ai(x), . . . , a n (x)) vector valued function such that 

1. < a<(x) < 1, 
,2. /or every x, y G A, 

g(x)-g(y) < Y a ^ x ^ 

3. and for every x G A, 

2^ a;j(x) < ag(x) + 6. 

i=l 

Similarly, a function g : A — > R zs called weakly a-(a,b) -self-bounding for a,b > if there 
exists a(x) : A — >■ R", a(x) = (ai(x), . . . , a n (x)) vector valued function such that 

1. for every x, y G A, 

g(x)-g(y) < a ^ x ^ 

2. and for every x G A, 

n 

«i(x) 2 < ag(x) + b. 

i=i 

Remark 1.2. TTie following relations hold: 

(a, b) -self-bounding weakly (a, b)-self-bounding 

a-(a,b)-self-bounding =^> weakly a-(a,b) -self-bounding 
The reverse implications are false in general. 

1.2. Concentration inequalities in dependent spaces 

Concentration inequalities have been proven for many different dependence structures, for 
different set of functions, and using different methods. 

The first such results are due to Marton, who developed the transportation cost inequality 
approach to prove concentration for Hamming Lipschitz functions, and Talagrand's convex 
distance inequality, for contracting Markov chains in Marton (1996). She generalized her re- 
sults larger classes of processes in Marton (1998) and in Marton (2003). Samson (2000) uses a 
similar approach, he proves Talagrand's convex distance inequality, as well as some inequal- 
ity for $-mixing processes. Concentration for Euclidean Lipschitz functions of dependent 
random variables is proven in Marton (2004). 

Martingale type arguments have been used by several authors to prove concentration in- 
equalities in dependent spaces. They work well for Hamming Lipschitz functions, but are 
difficult to generalize to other types of functions. For recent results, see Kiilske (2003), 
Chazottes et al. (2007), and Kontorovich (2007). Wu (2006) uses transportation cost in- 
equalities to get concentration of Lipschitz functions (in various distances). 
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Chatterjee (2005) uses Stein's method of exchangeable pairs to show concentration in- 
equalities for Hamming Lipschitz functions of weakly dependent random variables. 

The following definition allows us to quantify the dependence between the random vari- 
ables: 

Definition (Dobrushin's interdependence matrix). Suppose A = (ay) is an n x n matrix 
with nonnegative entries and zeroes on the diagonal such that for any i, and any x, y G A, 

n 

drv(lH(-\x-i),Hi(-\y-i)) < J^aylfo ^ yj] 

i=i 

where dxv is the total variational distance. Then we say that A is a Dobrushin interdepen- 
dence matrix for the random vector X (or equivalently random measure fi). 

We define the "weighted" Hamming distance (Chatterjee (2005)): 

Definition (Weighted Hamming distance). Let ci,...,c n be fixed positive constants, then 
for x,y E A, we call 

n 

dc(x,y) : = J^Cil [x { ^ y t ] 
i=i 

the weighted Hamming distance of x and y. Similarly, for a set S G A, we can define 

d c (x,S) := min d c (x,y) 
y&S 

One of the main results of Chatterjee (2005) is the following theorem: 

Theorem 1.1. (Theorem 4.3 of Chatterjee (2005)) Suppose that A is a Dobrushin interde- 
pendence matrix for fi, satisfying \ \A\\2 < 1, then for every g : A — > R which is 1 - Lipschitz 
with respect to the generalized Hamming distance d c , for t > 0, we have 

F(g(X) - Eg(X) > t), F(g(X) - Eg(X) < -t) < exp ( "^J 1 ^] 2 ^ ) . (1.2) 

Remark 1.3. Although this theorem is a statement about functions, it is equivalent to the 
usual formulation of concentration inequalities involving sets. See Dzindzalieta (2012) for 
more details. 

Our goal in this paper is to generalize this theorem to self-bounding functions. 
2. Results 

2.1. Independent case 

The following is the version of Theorem 1 of Boucheron, Lugosi and Massart (2009) that we 
obtain: 

Theorem 2.1. Let X = (X±, . . . ,X n ) be a vector of independent random variables, taking 
values in A, and let g : A — > K be a non-negative measurable function such that Z = g(X) 
has finite mean. Let a, b > 0. 
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If g is (a, b) -self-bounding, then for all < 9 < 1/a, 



and for all t > 0, 

F{Z > EZ + t} < exp I - 



t 2 



2(aEZ + b + at) ) ' 



If g is weakly (a, b)-self-bounding and for all i < n, all x G A, gi{x\j) < g{x), then for 
all0<9< l/(2a), 

logE w-™ < (" EZ + ")f 2 

& L J - (1 - 2a#) 

and for all t > 0, 



F{Z > EZ + t} < exp 

Let us denote the unique positive solution of 

eis; — 1 



A(aEZ + b + at) 



(2.1) 



l/(4a) 
5?/ a c « 0.285256. 

Suppose that g is weakly (a, b) -self-bounding, and g(x) — gi{x\i) < 1 for each x G A. 
T/ien /or < t < EZ , for a > a c , 

F{Z < EZ — t}< exp 



For a < a c , 



F{Z < EZ — t}< exp 



e 

'8(aEZ + b) J ' 
f 2 



5(aEg(X) + b) + (2/3)4 / ' 



2.2. Weakly dependent case 

Theorem 2.2. Let X = (Xi,...,X n ) be a vector of random variables, taking values in 
A. Let A be a Dobrushin interdependence matrix for X, and suppose that \\A\\i < 1 and 
Halloo — 1- Let g : A — > IR be a non-negative measurable function such that Z = g{X) has 
finite mean. Let a,b > 0. 

• If g is a-(a,b)-self-bounding, then for < 9 < (1 — ||v4||i)/a ; 

, ^ r my W7\i (aEZ + b)9 2 

g L J "2(1 -Plli -OB)' 

and for every t > 0, 

¥{Z > EZ + t} < exp (-- ^~ II^H 1 ^ 2 



2(aEZ + b + at) 
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If g is weakly a-(a,b) -self-bounding, then for < 9 < (1 — ||y4|| 1 )/(2a), 

logE \e^-^} < , (aE f„ +b)g2 m (2.2) 
5 L J ~ (1 - \A\\ 1 -2a9) v ' 

and for all t > 0, 



F{Z > EZ + t} < exp 



A(aEZ + b + at) 



Suppose that g is weakly a-(a, b) -self-bounding, and for every x, x* G A differing only in 
one coordinate, \g(x) — g(x*)\ < 1. Then for > 9 > — , the following inequality 
holds: 

(log mm >_ _ (e -» ( oEs (x) + . _ ,^f^) • (2,) 

27ms implies that for <t < EZ and a > a c (l — ||^4||i), 

P{Z < EZ — t}< exp f- ^'J^H . 

For a < a c (l — | \A\ 

¥{Z < EZ — t}< exp 



5(aEg(X) + b)/{\ - ||A||i) + (2/3)4 
3. Applications 

3.1. The convex distance inequality 

Talagrand's convex distance inequality has been proven using the weakly self-bounding prop- 
erty in Boucheron, Lugosi and Massart (2009) Section 2. Here we will use our results to prove 
it for the dependent case. 

The following lemmas are analogs of Proposition 13 of Boucheron, Lugosi and Massart 
(2003) and Lemma 1 of Boucheron, Lugosi and Massart (2009) (see Appendix for the proofs). 

Lemma 3.1. For any S G J 7 , dx{x, S) is weakly a-(0, 1) -self-bounding. 

Lemma 3.2. For any S G J 7 , d^(x, S) is weakly a-(4, 0) -self-bounding, and satisfies that for 
every x, x* G A differing only in one coordinate, \d^{x, S) — d^(x*, S)\ < 1. 

Corollary 3.1. 



F(X G S)E 



e d r (X 1 S)*.(l-|| J 4||i)/26.1 



< 1. 



Proof. By Lemma 3.2, we can apply Theorem 2.2 to g(x) := a%{x, S) with a = 4, b = 0. 
(2.3) gives that for > 9 > ^i^Mk 



8 ' 



(logm(0))' > - (e~ 9 - 1) 2 — ( AEg(X) - 9—^^-—] 

v v " ~ v y i-|A||iV (i-|A||i + 80)y 
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Here (e~ e — l) < (— 9) el y~ 1 . Let us define 9* : = izpjj^ > then we can write 

(logm(() ) r >^ r f 8Es( x)-«- 16E ^ 



1/8 V (1 + 80 

By integration we get that 

logm(fl) < ^^(X) (V) 2 + - ±bg(l + 8(9*)) (1 - |L4||i) 

Now applying Markov's inequality gives F(X e S) = F(f(X) < -Eg(X)) < log m(9) + 
9Kg(X). To minimize this, we solve 

e 1/8 „. 16 Eg(X) \ 



1/8 c^m-Cp^j-^x), 

which gives 9* m « -0.0806628 > -1/8, and thus 

P(* € 5) < ^^(X) (s^) 2 + \e* m - llog(l + 80*,) ) (1 - IIAHx) 

On the other hand, by (2.2), we have that for < 9 < (1 - ||A||i)/8, 

4EZ0 2 



logE [e 9 ( z - EZ )] < — 



- \A\\i-8e) 7 
thus 

4EZ# 2 1 



P(X e S)E [e ez ] < exp (eZ (^9 + — 



\A\\i-89) 21.345 



<i 



for = i^Mk. □ 
3.2. Curie-Weiss model 

The well-known Curie- Weiss model of ferromagnetic interaction is the following: the state 
space is A = { — 1, l} n , and we denote a configuration by a = (0*1, . . . , cr n ). The Hamiltonian 
for the system is 



1 

#(cr) := ^ a i a i + h Z~2 

l<i<j<n i=l 

the probability density is 

P p(cr) = (Zpy 1 eM-PH(a)), 
where Z p : = Eaii configurations ex P( _ / 3 ^( (T )) is the normalizing constant. 
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Proposition 3.1. For a as above, the Dobrushin interdependence matrix A satisfies 

||A|| 1 ,||A|| 00 ,||A|| 2 </3. 

Proof. We will now calculate the Dobrushin interdependence matrix for this system. Suppose 
first that h = 0. Let x and y be two configurations, then we want to bound 

d T v(lJ>i(-\xi),IJ>i(-\yi)) 
Since cr, can only take values 1 or — 1, so the total variation distance is simply 

dTv(lM(-\zi),iM(-\yi)) = |P(0i = l|aci) - P(cTi = l|!7i)|. 
Now by writing m;(x) := \ Y.J-,, , , and m i{y) '■= £ .■ , we can write 

exp(/3mi(x)) 



exp((3rrii(x)) + exp(— /3raj(x)) ' 



so by denoting g(t) := cM ^ { _ t) = 1+exp 1 ( _ 2t) , we can write 

\F(ai = l\xi) -Pfa = l|y,)| = |y(^mi(x)) - y(/9m<(y))|. 

Now = (iSf1|? = cxp(- 2 t)+i+c X p( 2f ) > so ^ i and changing one spin in z can 

change m, at most by ~, so 

|P(«7 i = l|x i )-P(«7 i = l||7 i )|<i)9|m i (x)-m i (y))|< V ^, 

2 ^—f n 

therefore we get Dobrushin interdependence matrix A with = - for i ^ j. For this A 
matrix, it is easy to see that 

11^11! = ||A|| 00 = ||A|| 2 = /3^1-ij <(3. 

Thus for the /3 < 1 high temperature case, we can apply Theorem 2.2 or Corollary 3.1 to 
get concentration inequalities. 

One can show that an external magnetic field /i / does not change the Dobrushin 
interdependence matrix, so the same results hold. 

Now we are going to show a concentration inequality for the average magnetization of the 
Curie- Weiss model. Let us denote the average magnetization by m := ^XliLi "*- Then we 
have the following proposition: 

Proposition 3.2. If h < 0, then for each t > 0, 

(1 - (5)nt 2 



P(m(cx) > E(m(o-)) +t)< exp 
P(m(<7) < E(m(a)) - t) < exp 



8/(1 + e- 2h ) +t 

(l-/3)nt 2 \ 
32/(1 + e- 2h )J ' 



D. Paulin/ Concentration for Weak Dependence by Stein's Method 
Similarly, in the h > case, we have 

{1-P)nt 2 



P(m(a) > E(m(a)) +t)< exp 
P(m(cx) < E(m(cx)) - t) < exp 



32/(1 + e 2h ) 
(1 - (3)nt 



i/(l + e 2h ) + t^ 

Remark 3.1. This proposition is a much better result for large values of h than what we 
could get from 1.1 using only the Hamming Lipschitz property, which does not capture the 
fact that in such cases Oi and thus m(a) has small variance. 

Proof. Let N + (a) = Y17=i -"-t* 7 * = ^ ^ ne num ber of 1 spins, then m = 2JV +~ n . 

Suppose that h < 0. N + (a) is a sum of non-negative variables, so one can easily see that 
it is a-(l, 0) self-bounding, and thus, by Theorem 2.2, we have for every t > 0, 

/ t 2 

P(iV + ((T) > E(N+(a)) +t)< exp 



P(N+(a) < E(N + (a)) - t) < exp 



2E(N+(a)) + 2t 
t 2 



8E(iV + (c7)) 

We can easily see that P[cJi = 1) = e h+ e -h — \ + l-2h , and thus 

n 

E(N + (a)) 



l + e -2h> 

which, in turn implies the following concentration inequalities for m(o~): 



(m(o-) > E(m(a)) +t) = P(iV+ > E(A^+) + -t) < exp 



n . / nt 2 



P(m(a) < E(m(a)) -t)= P(iV+ < E(A^+) 1) < exp 



2>- r V 8/(l + e~ 2h )+t 
n s ( nt 2 



2 ' ~ * \ 32/(1 + e~ 2h ) 

The h > case is analogous. □ 

It is possible to get similar results for other statistical physical models as well, if the 
temperature is sufficiently high. For an estimate on the Dobrushin interdependence matrix 
depending on the energy function of the model, see Lemma 4.4 of Chatterjee (2005). □ 

3.3. Examples of at. -self -bounding functions 

The following kind of self - bounding functions can be easily shown to be a-self-bounding 
too: 

• configuration functions, 

• supremum of non-negative empirical processes, 

• fractionally subadditive functions 

• L2 norm of a symmetric real matrix (or Hermitian complex matrix). 
It is not clear to the author whether the following are ct-self-bounding: 

• combinatorial entropies, 

• submodular functions. 
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4. Open problems 

We propose two problems: 

1. (Concentration for low temperature Curie- Weiss model) Find out what is the form of 
concentration inequalities (at first for Hamming-Lipschitz functions) that hold for the 
low temperature case. One could try to prove concentration inequalities under condi- 
tioning. We expect that under conditioning on the sum of the spins being positive, 
concentration inequalities hold with a factor (3/ ((3 — 1) times weaker than in the in- 
dependent case (or possibly (1 + g(f3 — l))/g(j3 — 1) times weaker, for some monotone 
increasing g with g(0) = 0). In particular, for (3 large, we expect similar behavior as in 
the independent case. 

Central limit theorem is known to hold under conditioning, see Ellis and Wang (1990). 
See also Chazottes et al. (2007) for stretched exponential inequalities, and Chatterjee 
(2007) for a concentration inequality for the mean magnetization that holds at all 
temperatures. 

2. (Concentration for Ising model) Prove that concentration inequalities hold for the 1 
dimensional Ising model at all temperatures. We expect that the constants are roughly 
1/(1 — tanh(2/3)) times worse than in the independent case (Lubetzky and Sly (2009), 
Corollary 3 shows that this is the factor of slowdown in the cutoff of the continuous 
time Glauber dynamics). Also, prove that concentration inequalities hold for the 2 
dimensional Ising model at the whole high temperature regime ((3 < (3 C = |log(l + 
v^2) ) • We expect that exchangeable pair couplings similar to those used in the proofs of 
Theorem 4.3 of Chatterjee (2005), and Theorem 2.2 could be used for these problems, 
but more information about the models have to be included in the estimations. 

5. Preliminary results 

5.1. Basic properties of the total variational distance 

The following exposition is based on Problem 7.11.16 of Grimmett and Stirzaker (2001b) 
and its solution in Grimmett and Stirzaker (2001a). The total variational distance of two 
measures /ii and \x 2 defined on the same measurable space (X, J 7 ) is defined as 

drv{fJ>i, fJv) = sup \(J,i(S) - n 2 (S)\. 

The following lemma summarizes the basic properties of dxv'- 
Lemma 5.1. 

(i) If two random variables X : Q — > X and Y : Q — > X are defined on the same probability 
space (Q,,J r fi,¥) such that X ~ fix and X 2 ~ then we have 

F[X^Y] >dT V (^,!J 2 ). 

(ii) Conversely, if \\L\ and \x 2 are two probability measures on (X^J 7 ), then we can define 
a probability space (f2, J-q,P) and random variables X : Q — > X and Y : — > X such 
that X ~ /jLx, Y ~ /i2 and 

F[X^Y] = d TV ( fJll , f i 2 ). 
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(this is called the maximal coupling of \x\ and fj, 2 ). 
Proof. (i) For any S E F, 

|P(X G S) -P(Y G S)\ = \¥(X = Y,X G S) + P(X ^ Y,X G 5) 

-ppf = y y g 3) - ¥(x ^y,y e S)\ <¥[x ^ y], 

so taking the supremum in S gives the result, 
(ii) We are going to define X and Y from 4 independent variables -B, C, D and x> X being 
an indicator variable, and Q, J"b,P is going to be the product space containing these 
variables. 

Let's define the measure fii 2 (-) on (X, J 7 ) as 

Vl2{S) = , 

then fi\ << /ii2 and \i 2 << fiu, so we can define the Radon-Nikodym derivates f,g : 
X -> R as 

* := 7i ' 9 := 7i • 

dj2 12 d\lx2 

With these, we can write 



f(x) - g(x)dfj ll2 (x) 



¥(X E S) — P(y E S) = f 

J xd 

If we define the set R := {x E X : f(x) > g(x)}, then R := {x E X : f(x) < g(x)} and 

f(x) - g(x)dfi 12 (x) < P(X E S) — P(y G 5) 



< / f(x) - g(x)dfi 12 (x) 

Jx&R 



On the other hand, we can see that 



_ f( x ) ~ g(x)dfx 12 (x) + / f(x) - g{x)d^ 12 {x) 
xeR JxeR, 

f(x) - g(x)d{i 12 (x) = 0, 

xex 



F(X E S) — P(y E S) < [ f(x) - g(x)dfx 12 {x 

Jx£R 



thus 



and 

drv{HuHi)= \ f(x)-g(x)dfi 12 (x). 
JxeR 

Let us define h : X — > R as h(x) = min(f(x),g(x)), and let us denote p := c? T y(/i l7 /x 2 ) 
then we can write 

P= f(x) - h(x)d/i 12 (x). 
Jxex 
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Let us define the laws UbiHc an d Hd on (X.J 7 ) as 

h(x) 



Hb(S) = 


Ixes 






Pc(S) = 


/ 




Jx€S 


(i D (S) = 


/ 

Jx&S 



dpL 12 (x), 

dn 12 (x), 
dfi l2 (x). 



1 — p 

f(x) - h(x) 
P 

g(x) - h(x) 
P 



Now let us define B ~ C ~ fic, D ~ jio and x ~ Bernoulli(p) independent 
random variables on the product space 

(fi, Jh,P) = (Af 3 x {0,1}, J 73 x J^o,!}, yU B x fi c x fi D x Bernoulli(p)). 

Now if we set 

X = (1 - x) ■ B + X ■ C, Y = (1 - x ) ■ B + X ■ D, (5.1) 
it is easily to verify that F(X ^ Y) = p. 

Remark 5.1. When p = then X = Y = B , so we do not need to define C and D. 
On the other hand, when p = 1, we have X = C and Y = D a.s., we do not need to 
define B. 

Remark 5.2. It is also possible to choose x ~ Bernoulli(q) for some q > p, in this 
case, we can write 

X = {1- x)B' + X C, Y = (1 - x) ■ B' + x ■ D', (5.2) 

with x, B' , C and D' independent random variables, with distributions 

fJ-B'(S) = [ ^—d/i 12 (x), 

fic(S) = / (h(x)% h/(x) ) -d/i 12 (x), 

fj,D'(S) = / (h(x)- \-g(x) ) -d/j, l2 (x). 

Jx&s V 1 - P J Q 



Now P[X ^Y]<q. 



□ 



5.2. Concentration by Stein's method of exchangeable pairs 

Suppose that we would like to get concentration inequality for f{X), f : X — y M., where X is 
a Polish space, and X is a random variable taking values in X . Suppose that E(/(X)) = 0. 
Let (X, X') be an exchangeable pair, m{6) := E(e^^ x ^). Suppose that F(x,y) : X 2 — >• M. is 
an antisymmetric function satisfying 



E(F(X,X')\X) = f(X). 



(5.3) 
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Then for 9 > 0, 

m\9) = E(f\X)e ef{x) ) = E(F(X, X')e ef{x) ) = -E(F(X, X')e 0fix ' } ) (5.4) 

= E ( F(X,X') ^ J . (5.5) 

In Chatterjee (2005), this is further bounded by 

E(±\F(X,X')\\f(X) - f(X')\e 9 fwY 

and conditions on A(X) := ±E(\F(X,X')\\f(X) - f{X')\\X) determine the concentration 
properties of /(X). 

In this paper, we are also going to use (5.5), but instead of taking absolute value, we take 
positive and negative parts. 

In order to apply the approach for some function /, we need to find the antisymmetric 
function F(x,y) such that (5.3) is satisfied. This is done in Chatterjee (2005), Chapter 4 by 
a method using a Markov chain, we give a summary below. 

An exchangeable pair (X, X') automatically defines a reversible Markov kernel P as 

Pf(x):=E(f(X')\X = x) J (5.6) 

where / is any function such that E|/(X)| < oo. 

In the following, we are going to construct F(X,X') from (X, X') and /(X), using the 
Markov kernel P. 

Lemma 5.2 (Lemma 4.1 of Chatterjee (2005)). Let f : X — >• R be a measurable function 
such that Ef(X) = 0. Suppose there is a finite constant L such that 

oo 

\ pk f( x ) - P k f(y)\ < L for every x and y. (5.7) 

k=0 

Then the function 

oo 

F(x,y) :=Y,(P k f(x)-P k f(y)) 

k=0 

satisfies F{X,X') = -F(X',X) andE(F(X,X')\X) = f(X). 

Although Lemma 5.2 gives an explicit expression of F, it is inconvenient to use in practice. 
The following coupling version is more useful: 

Let {X(k)}k>o and {X'(k)}k>o be two chains from the kernel defined by (X, X'), with 
arbitrary initial values, and coupled according to some coupling scheme which satisfies the 
following property: 

P For every initial value (x,y), and every k, the marginal distribution of X(k) depends 
only on x and the marginal distribution of X'(k) depends only on y. 

Under this assumption, the following lemma holds: 



D. Paulin/ Concentration for Weak Dependence by Stein's Method 14 

Lemma 5.3 (Lemma 4.2 of Chatterjee (2005)). Suppose the chains {X(k)} and {X'(k)} 
satisfy the property P described above. Let f : X — > E be a function such that Kf(X) = 0. 
Suppose there exists a finite constant L such that for every (x,y) £ X 2 , 

oo 

\W(X(k)) ~ f(X'(k))\X(0) = x,X(0)' = y)\ < L. (5.8) 

k=0 

Then, the function F defined as 

oo 

F(x,y) := £E(/(X(*)) - f(X'(k))\X(0) = x,X'(0) = y) 

k=0 

satisfies F(X,X') = -F(X',X) and E(F(X, X')\X) = f(X). 

Remark 5.3. It is useful to start with X(0) = X and X(0)' = X' , because we can bound 
F(X,X') during the verification of (5.8). 

The simplest example is proving Mcdiarmid's bounded differences inequality for indepen- 
dent variables: 

Example 5.1 (Example from Chapter 4.1 of Chatterjee (2005)). Let X := (X 1} . . . ,X n ) be 
a vector with independent components (Xii component i, {X(k)}k>o^ Markov chain). Let 

X«:=(x{ r \... ) XW) (5.9) 

be independent copies of X, forr > 0. Let /, 1(1), ■ ■ ■ , I(k) . . . be uniformly distributed indexes 
in [n], independent of each other and of X and {X^} r >Q. Define X' as 

X- = Xifori^I and X\ = x\ 0) . 

Now we are ready to construct X(k) and X'(k): 

Suppose that X(0) = x and X'(0) = y, for x, y £ Q. For k > 1, define X(k) as 

Xi(k) := Xi(k - 1) for % ± I(k) and X I(k) (k) := xf^. 

Similarly, for k > 1, define X'(k) as 

Xi(k) := X-(k - 1) for i ± I(k) and X' I(k) (k) := X I(k) (k). 

With this definition, {X(k)}k>o and {X'(k)}k>o are having the same distribution as the 
Markov chain defined by the kernel Pf(X), moreover (X(k), X'(k)) satisfy property P. We 
can prove condition (5.8) by the coupon collector's problem. 

Now suppose that we start with (X(0), X'(0)) = (x,y). If x and y differ only at coordinate 
i, then \f(X(k)) — f(X'(k))\ < Ci for every k. Therefore 

oo 

\F(x,y)\ = J2W(X(k)) - f(X'(k))\X(0) = x,X'(0) = y) 

k=0 

oo 

= ^E(ca[i i {1(1), I(k)}]\X(0) = x, X'(0) = y)< n Ci . 

k=0 
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Consequently, we have 

A(X) = ^E(\(f(X)-f(X'))F(X,X')\\X) 

n n 



i=l i=l 

Theorem 3.3. of Chatterjee (2005) gives us 



P(/(X) - E/(X) > t), P(/(X) - E/(X) < -*) < exp ' 



5.3. Independent case 

Let X = (Xi, . . . , X n ) be an vector of independent random variables, taking value in A. Let 
/ : A -> E be a function with E/(X) = 0, let m{9) := E(e ef(x) ). For x G A, i < n, define 

tti(x) := /(x) - inf f(x 1 ,...,x i - 1 ,x' i ,x i+1 ,...,x n ). 
Lemma 5.4. For 9 > 0, 

oo / - n \ 

m'(0) < ]Te fl^W • - J^pf^Hpf)]^ £ {/i,...,4}] 

fc=0 \ i=l J 



Proof. 



m'{9) = E(f(X)e ef{x) ) = 

E (F(X,X')e 9 ^) = ^E (F(X,X')(e^W - e^ x ')) 



_ e ef{x,) ) + 



-e(f(X)-f(x'))+\ e of(x) 



<E((F(X,X')) + (e e 

= E((F(X,X'))+(l-e- 

< E ((F(X,X')) + (/(X) - f(X')) + 9e e ^) 

/ oo 

< e E - (/(*) - /(*'))+ 



,fc=0 



oo 



< J> (a/(X(Ar))ai(X)l[/ £ {A, . . . , h}]9e 6 ^) 

k=0 

oo / 1 n \ 

e */W . _ Y,<X{k))a i {X)t[i i {h, . . . , 4}] . 



k=0 

< 



\ n 

k=0 \ i=l 



□ 
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Lemma 5.5. For 9 < 0, if f(X) - f(X') < 1 a.s., then 



m'{6) > 

Proof. 



oo / n \ 

{e~ e - 1) ■ - J2<X(k)) ai (X)l[i £ {h, ... ,I k }}\ 

k=0 \ i=l / 



m 



\6) = l -E (F(X,X') (e ef ^ - e^)) 

> -E f(F(X,X')) + (e 9 ^ - e 9 ^) \ 

> -E f(F{X,X')) + (e e -« x ') - e e/ W) J 

> —E ({F{X,X)) + ( e W)-/(*)) - l) + e ^ x ^ 
= —E ([F(X,X')) + ( e -«(/(*WC*')) + _ ^ e */(^ 

Since 9 < 0, and (e^ x - l) / x is a monotone function in x for x > 0, using < (f(X) — 
f(X'))+ < 1, we get 

( e -W)-/(*'))+ _ ^ < {f{x) _ f(r))+ {e -e _ 1} 
Now we proceed the same way as in the proof of Lemma 5.4 to get our claim. □ 
5-4- Weak dependence 

Let X = (Xi, . . . , X n ) be an vector of random variables taking value in A, with Dobrushin 
interdependence matrix A. Let / : A — > M be a function with Ef(X) = 0, and a : A — > M" 
be a vector such that for any x, y G A, 

n 

/(z) - /(y) < 1 ^ ^ ( 5 - 10 ) 

Lemma 5.6. For # > 0, 



m'{6) <E ^ [L(k) ■ a{X{k))} ai {X)ee 6f{] 

\k=0 

Proof. The same way as in the proof of Lemma 5.4 we have 

m'{9) < E ( jr (f(X(k)) - f(X'(k))) + (f(X) - f(X')) + Oe^A . 

\k=o J 

Using (5.10), we have 

(/(X) - f{X')) + < ai (X), and (f(X(k)) - f(X'(k))) + < L(k) ■ a(X(k)). 

□ 
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Lemma 5.7. For 9 < 0, if f(X) - f(X') < 1 a.s., then 

oo 

m'{6) > - E ( i e ~ d ~ 1) e9/(X) [HQ ■ a(X(k))] aj) 

fc=0 

Proof. We proceed the same way as in the proof of Lemma 5.5 to get 



m'{9) > - ((e"* ~ 1) e mX) ■ (J(X(k)) - f(X'(k))) + (f(X) - f(X')) + ) , 

k=0 

and applying (5.10) gives the result. □ 



5.5. Additional lemmas 

We will use a part from the proof of Theorem 3.13 of Chatterjee (2005), which proves 
concentration in the case A(X) is not bounded almost surely, but itself is concentrated. This 
is similar to Lemma 3.1 of Bousquet (2002) and Lemma 11 of Massart (2000). 

Lemma 5.8. For any random variable V, and any L > 0, we have for every 9 G R ; 
E{e 9f[x] V) < L~ l logE(e iy )m(fl) + L~ l 9m\9) - L^m{9) log(m(0)). 

Proof. Denote u(X) := 

Let A, B > be two random variables with finite variance and K(A) = 1, then 

E(Alog(B)) <log(E(AB)), 

this can be shown by changing the measure and applying Jensen's inequality. Using this, we 
have 

E(e ef WV) = L~ 1 m(9)E (u(X) (log + log u(X)^j \ 
< L~ l logE(e LV )m(9) + L _1 E (e ef{x) logu(X)) , 

here we applied our previous inequality with A = u(X) and B = ^nq- Now using the fact 
that log(w(X)) = 9f(X) - \og(m(9)), we get the result. □ 

We will use the following well known result many times in our proofs: 

Lemma 5.9. Let W be a centered random variable with moment generating function m{9). 
Let C,D > 0, suppose that m{9) is finite, and continuously differentiate in [0, 1/C), and 
satisfies 

m'{9) < C9m'{9) + D9m(9). 

Then for0<9<l jC, 

D9 2 

log(m(0)) < 2{1 _ ce y (5-11) 

and for every t > 0, 

t 2 



P^^expl-^^). (5.12) 
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Proof. Rearranging gives 

(1 - C9)m'{9) < D9m(9) 
Df) 

log(m(0))' < Y~C6 

, , ,„xx /" e Dx DO D\og(l-C9) DO 2 
\og(m(9)) < / — — = — i < 



2=0 



1-Cx C C 2 -2(1-C0)' 



2 

using the fact that for < z < 1, — z — log(l — 2) < ^3-. We get the tail bound by applying 
Markov's inequality for 9 = D f ct - □ 

6. Proof of main results 
6.1. Independent case 

Proof of Part 1 of Theorem 2.1. By Lemma 5.4, we have for 9 > 0, 



00 / n N 

m'{9) < E #e 9 ^ ■ cti(X{k))ai(X)t[i £ 1(1), I(k)} 

k=0 \ i=l / 

Now by our assumption, ai(X(k)) < 1, and using that g is (a,b)-self-bounding, 

00 / n \ 

m'(^) <J2 E \ 6e9f[x) ■ - E a * w 1 ^ * • • • ' J ^)] 

k=0 \ i=l / 

<e (*•«*>. I i>mf; (i-i)') 

< E (#e e/(x) (a(?(X) + 6)) = E (9e ef(x) (af(X) + (aEg(X) + 6))) 

< 9am'(9) + (aE#(X) + b) m(9). 

Applying lemma 5.9 gives the result. □ 
Proof of Part 2 of Theorem 2.1. By Lemma 5.5, we have for 9 > 



m 



00 / 1 n \ 

< ^ E e W*) • - Y c*i(X (k))ai(X)l[i i 1(1), I(k)} . (6.1) 

fc=0 V i=l / 

Now by the fact that (7 is weakly (a, 6)-self-bounding, we have 

n 

Y< X Y <ag(X) + b (6.2) 

i=l 

n 

Y,< X W) 2 < ag(X(k)) + b (6.3) 



i=i 
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We will use the conditional version of the Cauchy-Schwarz inequality: if Ai,Bi are random 
variables for 1 < % < n, then 

EiAiBi\X) < (E(A 2 |X)) 1/2 ■ (E^IX)) 172 

(n \ n 

5>a x)<y: (mi\x)) 1/2 ■ (m\x)) 1/2 
i=l / t=l 

Now writing A { = ai(X)l[i £ 1(1), I(k)\ and B { = ai(X(k)), we get 

n 

E(ai(X(k))ai(X)l[i $ 1(1), I(k)}\X) 

i=l 

n 

< (H<Xfl[i £ 1(1), I(k)]\X)) 1/2 ■ {E(a t (X(k)) 2 \X)) 1/2 
i=i 

/ i \ fc/2 n 
{ 1 \ k/2 n 1 

^ n ' 1=1 



, \ fc/2 1 

<(!__) . -E(ac/(X) + & + a#(X(fc)) + 6|A") 



n J 

Substituting this into (6.1), we get 

oo / oo , 1 \ *;/2 1 

m'(0) < £ E 0e"W- £ ( 1 - - ) ^9(X) + 6 + ay(X(*)) + b) 

k=0 V fc=0 ^ ' 

oo / 1 oo / 1 x fc/2 \ 

<^E fc»'< x >iWl-ij (a S (X) + 6) 

fc=0 V fc=0 v 7 / 

< E (#e e/( - Y) 2(a#(X) + b)) = E (^e e/(x) (2a/(X) + 2aE#(X) + 26)) 

< d2am'(8) + (2aE#(X) + 2b) m(8). 

Here we have used the fact that for 6 > 0, 

E( e "W/(X(*))) < He Bf{X) f(X)), (6.4) 

since using the exchangeability of f(X) and f(X(k)), 

E (e^W (/(X) - = E (e e ^\f(X(k)) - f(X))) 

= E (( e */W _ e W«)) (/(X ) - /(X(Jfc)))) > 0, 

since e e ^ x "> — e 6 ^ x ^ and f(X) — f(X(k)) always have the same sign. 

We conclude by applying Lemma 5.9. □ 
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Proof of Part 3 of Theorem 2.1. Now we will bound the lower tail, so suppose that 9 < 0. 
By Lemma 5.5, 

oo / 1 n N 

m'{9) > - E {e~ d - l) e e/ ^ • - ]T £ /(l), . . . , /(*)] 

k=0 \ i=l / 

In Part 2, we proved that 

n 

J2H^(X(k))a,(X)l[z i 7(1), . . . , I(A;)]|X) 
i=i 

/ l\ k/2 1 

so we get 



n 



m'{9) > —E ((e- e - l) 

oo , - \ k/2 j 

■El 1 "- "2 (a/(X) + a ^( X ^)) + 26 + 2a %PO) 
fc=o V ra / , 

The terms involving f(X(k)) cause some difficulty: although we can show (same way as 
in Part 2) that 

-E(e ef{x) f\X(k))) < -E(e 9/(x) /PO), 

for us the other sided inequality would be more convenient. 

Nevertheless, we can use the concentration properties of f(X(k)) from Part 2 to bound 
this term: by Lemma 5.8, for any L > 0, 

E(e ef{x) f(X(k))) < L" 1 \ogE(e L f {x{h)) )m(9) + L" 1 9m' (9) 

Now by exchangeability E(e L ^ x( - k ^) = E(e L -^ x ^) = m(L), and we can use the bound from 
Part 2 to get that for < L < l/(2a), 

, , tT „ ^ {oMg(X) + b)L 2 
bg(m(L)) * (l - 2aL) 

E(e°f^f(X(k))) < {a ^l^ )L ^9) + L^9 m '(9) 



f 1 QiQjL) 



= E 

Substituting this back to (6.5), and summing up in k as previously, we get 



m'{9) > - (e- e - 1) 



•E 



e e ^ ( 2aEg(X) + 2b + Q (a%(X) + b)L ) + f(X)e°f™ (a + aL~ l 9) 
V (l-2aL) J 
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A convenient choice for L, which makes the inequality tractable, is L = —9. With this 
choice, for > 6 > — we get 

m>m > - (e~» - 1) (2aE 9 (X) + 26 - m(9) 

log( m .(»))' > - (e-» - 1) ( 2oEs (X) + 26 - . '^y 

Suppose that > 6 > -i, then 1 + 2a# > 1/2, so 

log(m(0))' > - (e~ d - 1) (2 - 2a6)(aEg(X) + 6) (6.5) 

Now we consider two cases, depending on the size of a. The function [e x — 1) jx is increas- 
ing for positive x, so we can write 

( e s — 1 ) r 
-(e-»-l)(2-2 .)>\^-. 



e4a — 1 



5 



log(m(0))' > 1/(4a) V (a%(X) + fe ) 



e4a — 1 



log(m(0)) < \ / §(aEg(X) + b)6 2 < 2(aEg(X) + 6)£ 2 
1/(40) 4 



whenever 



e4a — 1 



l/(4a) _ 5 
Let us denote the unique positive solution of 

eh, — 1 



< - (6.6) 



l/(4a) 5 

by a c ~ 0.285256, then for each a > a c , (6.6) holds. 

Using Markov's inequality, we have that for < t < Eg(X), > 9 > — 

logP(/(X) < -t) < \og(m(6)) +te< 2(aEg(X) + b)6 2 + &t, 

which takes its minimum at 

a . "* 



A(aEg(X) + b)' 



which satisfies > 9 > — and thus 



logP(/(X) < -t) < 



-t 2 



8(aEg{X) + b)' 



(6.7) 
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Finally, we need to tackle the case when a < a c . Going back to equation (6.5), we can 
write that for > 6 > — 

log(m(0))' > - {e~ e - 1) \{aEg{X) + b) 
log(m(0)) < (e~ e + 9-1) \{aEg{X) + b) 

Let us write C := %(aEg(X) + b), then by Markov's inequality, we have that for > 9 > — j-, 
0<t< Eg(X), 

log(P(/(X) < -t)) < log(m(0)) +9t< (e- e + 9 - l) C + 9t 
The minimum of the right hand side is taken at 

^„ = -log(l + i)>-log(l + ?.i 

which satisfies > 9 min > — j- whenever a < a c . Thus, in this case we have 

log(P(/(X) < -t)) < (± - log (l + ±Y) C - log (l + ± ) t 



C 



Now let us take a look at the x — log(l + x)(l + x) function for positive x, we can easily 
check that this is negative, and 



x — log(l + x)(l + x) < 



x 2 



2 + (2/3)z' 

so 

log(P(/(X) < -t)) < 



t 2 t 2 



2C+(2/3)t 5{aEg(X) + b) + (2/3)*' 

□ 



6.2. Weak dependence 

6.2.1. Coupling Scheme 

Here we will define a coupling scheme for 

{X(k),X'(k)} k > , 
with the condition that ||^4||oo < 1- Suppose we have already defined 

X(0), X(k) and X'(0), . . . , X'(k), 
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and that X(k) = x, X'(k) = y. Then let I(k+1) be uniformly chosen from [n], independent of 
the previously defined variables. Then we are going to update Xj^ k+1 )(k + 1) and Xjr k+1 Jk + 

!)• 

In order to do this, let us write 

v x := fi I(k+1) (-\x I{k+1) ) and u 2 := ^j(jh-i)('II7i(*+i))> 

then the same way as in Section 5.1, we can define B(k + 1), C(k + 1), D{k + 1), x(k + 1) 
as conditionally independent of each other and all the previously defined random variables 
given afj(fc + i) and Vi^+i)- By Remark 5.2 we can choose x(k + 1) ~ Binomial(q) for any 
q > d TV {vx,v 2 ). 

We want to be able to bound x(k + 1) by some linear expression. First let us define the n 
- vector + 1) as 

£(k + 1) := &i with probability a/(fc+i),i (i G [n]), otherwise £(k + 1) := 0. (6.8) 

Here = (0, . . . , 0, 1, 0, . . . , 0), an n vector having 1 in coordinate % and elsewhere. We 
suppose that £(h + 1) is independent of everything else we defined previously (except of 
I(k + 1)). Now let us define 

x (k + l) :=£(k + l).L{k), 

so x{k + 1) ~ Bernoulli(g) with q = Yn=i a i(k+i),iLi(k) > drv{vi, 
Then by Remark 5.2 we can define 

X k+1(I{k+1)) := (1 - X (k + l))B(k + 1) + X (k + l)C(fc + 1), 

and 

: = (! " + + !) + X(* + l)D{k + 1), 

for and for all i^I(k + 1), X^k + 1) := JQ(fc) and X[{k + 1) := X'^k). 

It is easy to verify (by induction) that this coupling scheme satisfies Property P. 
The advantage of this coupling scheme is that we can now write 

L(k + 1) < M I{k+lU{k+1) ■ L(k), (6.9) 

where M it ^ k+ i) is an n x n matrix which is basically and identity matrix except the ith line 
which equals £(k+ 1), for example: 



Mi 



3,(1,0,0,0,0) 



/ 1 \ 

10 

1 
1 

\0 1/ 



By repeating the same argument, and we get 

L(k) < M mm M mm ■ L(0). (6.10) 
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6.2.2. Proof under Dobrushin condition 

Proof of Part 1 of Theorem 2.2. For 9 > 0, using Lemma 5.6, we have 



24 



m'(9) < E i L ( k ) ■ a ( x ( k ))] ui{X)6e 



6f{X) 



Let {X (k) , X' (k)} k>o be defined as in our coupling scheme, then using (6.10), and the fact 
that £(0) < ej, we can write 

E([L(k)-a{X{k))] ai (X)\X) 

< E ( (M mm M /(1)iS(1)e/ ) • a(X(k))aj(X)\X) 

< ^E (a(X(k)Y (M mm M I(1)m ) a{X)\ X) 

< ( | \a(X(k)) | U 1 1 M mm M I{1)m a{X) \\ X \X) 

Now using the fact that for self - bounding functions, ||a(X(A;))|| < 1, the elements of 
and L(k) are positive for every k and the linearity of L\ norm, we can take the 



M; 



mm) 



expectation in X inside the norm, so 



E([L(k)-a(X(k))]aj(X)\X) 
< I I l E ( M mm ■ . . . • M im{1) | X) 1 1 1 1 \ a (X)||! 
E{M I{l)m \X) k (ag(X) + b) 



< - 

n 



i 



1 

< - 

n 



n n 



(ag(X) + b) 



i 



< - ( 1 - - + -\\A\U ) (af(X) + aEg(X) + b). 
n \ n n 



Now we can sum up in k, and get that 



OO , V k 

m'(9) < V - 1 - - + -\\A\\i) E ((af(X) + aEg(X) + b)9e e}{x) ) 
^— ' n \ n n I 



m'(9) << 



(a9m'(9) + (aEg(X) + b)9m(9)) 



l-\\A\\i 

Applying Lemma 5.9 proves our claim. 

Proof of Part 2 of Theorem 2.2. As in Part 1, we have that for 9 > 0, 



□ 



m'{9) < E Ij2 [L{k) ■ a{X{k))} a T (X)9( 



MX) 



,k=0 



and 
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E([L(A;)X*(*0)] 
< ^E(a(X (k)Y(M mm -.. 

"a(X(k))\\ 2 \\M mm M im(1) a(X)\\ 2 \X) 



M imm )a{X)\X) 



< -E 

n 

< l -E(\\a{X{k))\\l\X) l/2 E 

< -E(ag(X(k)) + b\X) 1/2 



\ M i(k)£{k) ■ ■■■■ Mni)^a(X) 



X 



1/2 



n 



■E(a{XYM 



mm 



■ M 



i(k),s;(k) M i(k),m 



■M imm a{X)\X) 



1/2 



< -E(ag(X(k)) + b\X) 
n 



1/2 



Oi 



(X) t E(M j 



mm 



M /(fc),C(fc) M /(fc),C(fc) 



...-M 



X) a(X)) 



1/2 



< —E(ag(X(k)) + fel X) 1/2 (ag(X) + b) 
n 

■ ■ M /(fc),?(fc) M /(fc),?(fc) • 



1/2 



■II e ( m W(d 

Now for example 



M 



,1/2 
I2 



^3,(1,0,0,0,0) ' ^3,(1,0,0,0,0) 



/ 1 





1 





\ 




( 1 











o\ 




( 2 











o\ 





1 
















1 
















1 




























1 








































1 
















1 
















1 





\o 











1 J 















1 ) 















1 / 



so Mw fe N £(fc)-^/(fc),£(fc) is diagonal, therefore it is easy to see that 



M W(i) 



M I(k)£(k) M I(k),z(k) • • • M /(l),£(l) 



is also diagonal. Moreover, by denoting the n x n matrix of only one 1 at position i,j and 
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zeros elsewhere by H(i,j) and H(i) := H(i,i), we can write 

E(M i(k)m M I{k)m \X, 7(1)^(1), ■ ■ ■ , I(k - 1), £(* ~ 1)) 
= nM l mm M mm \X) 



26 



1 n 



5>^(£ - 77(0 + H{i,j))\E - i7(i) + 77(z,j)) 



-i=i 



3=1 



1 - 

n ^— ' 

i=l 



£ ay (JS - J7(t) + 77(j)) + 1 - <kj - 7f (z)) 



i=i 



i=i 



\ / 4=1 j = l \ / j = l \j = l / 

Now using the conditions of our theorem, we have (J^Li a « j) — ll^lli < 1> so we can wr it e 
E(Mj Wi5W M 7(t) , ew |X, /(l), f (1), . . . , I{k - 1), f (k - 1)) 



By repeating this, we get that 



| E ( M W(i) ■ • • • • M /(fc),e(fc) M /(fc),C(fc) 



M 



J(i),€(i) 



1 1/2 
12 



-- + -PHl 

n n 



k/2 



so summing up in k, we have 



1 / 00 

m'(0) < -E ( J2 E ( a 9( x ( k )) + b\ X) 1/2 (ag(X) + b) 



1/2 



1 1 



,fc=0 



I-- + -PII1 
n n 



k/2 



< - e ( £ ( af{x{k)) + a/(x) + 26 + 2a%(x) 



,fc=0 

n n 



fc/2 



1 / 00 , . fc/2 \ 

^ - E ( E ( a / W + 6 + aE 9(X)) -\\A\\ij 



< E 



1-Plb 



if{X) + b + aEg(X))6e ef{x) ) . 



D. Paulin/ Concentration for Weak Dependence by Stein's Method 27 

Here we have used (6.4), just like as in the independent case. Applying Lemma 5.9 with 
C = T ^ JW - and D = 2{a ^ X J+ b) gives the result. □ 

1 — ||A||i 1 — ° 

Proof of Part 3 of Theorem 2.2. Now we will bound the lower tail, so suppose that 9 < 0. 
By Lemma 5.7, 

oo 

m'{6) > -^E((e- e -l)e 0/(x) [L(k) ■ a(X(k))\ «/) 

fc=0 

In Part 2, we proved that 

E([L(k) -a(X(fc))]a/| X) 



< 1 E ( af(X(k)) + af(X) + 2b + 2aEg(X) 



\ k/2 

x\ ( 1-- + -HAIU 

n \ 'i ) \ n n 

By summing up in k, we get 

m'(6>) > - (e- fl - 1) T- 1 1 - - + -\\A\U) 
. E ^/(*(*0) + q/(X) + 25 + 2aEfl(A) ^ 

As in the independent case, by Lemma 5.8, for any L > 0, 

E(e 8fm f(X(k))) < L- 1 logE(e L/(x(fc)) )mW + £~W(0), 
and by Part 2, for < L < 

icgEf^))) = io g (m(L)) < ^l + _ h ^ L y 

so we have 

E(e^af(X(k))) < a ( |^ i ) + 2 ^ ) m W + aL- 1 ^(g). 



By the convenient choice of L = — 0, we get that for > 9 > — 1 [p 



2a ' 



E (e"/W(/(X(fc)) + /(A))) < -a ^^,,^ ^), 



so for > > - ] 



2a ' 



m 

' n 

k=0 
\ fc/2 

•mW(l-- + -P|| 1 

n n 



/a \ 2 /-a (aE#(A) + 6)# „ \ ,„. 
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which implies (2.3). Suppose that > 9 > then 1 - |A||i + 2a6 > so 

m'(6) > - (e- 8 - 1) ~^~ a6 (aEg(X) + &)) m(0) (6.11) 

As in the independent case, now we will separate two cases depending on the size of a. 
First, let K := then for > 9 > -A, (e~ e - l) < and < f , so 

m'(0) > -0 • e ^\_j A|| J (a%(X) + 6)m(0) 

logm(fl) < # 2 ■ e V 1 l-|A|| 1 i (flEg(X) + b) ~ T^k (a%(X) + ^ 

whenever 

e K - 1 8 , 
— <g. (6.12) 

Let us denote the unique positive solution of the equation 

e K — 1 8 

by A c « 0.876405, then for A < K c , (6.12) holds. This means that for a > = 
(1-plliK, (6.12) holds. 

In this case, using Markov's inequality, we get that for < t < Kg(X), > 9 > — LdMk 



4a ' 



logP(/(X) < -t) < log(m(0)) + ^ < ^ ... (a%(X) + b)9 2 + 9t, 

1 - l^lli 

which takes its minimum at 

(l~\A\\Jt 
mm ~ 4(aEg{X) + bY 



which satisfies > 9 min > 4^> an d thus 



l-U\\i 

(l-l^lli)t 2 



logP(/(X) < -t) < 



8(aEg(X) + b)' 

Finally we need to check the case when a < (1 — ||A||i)a c . 
Going back to equation (6.14), we can write that for > 9 > 



i-Plli 

4a ' 



m\B) > - (e~ e - 1) ^^ (aEgiX) + &)) m(9) (6.14) 



\og(m(9))' > - {e' e - 1) ^^—{^{X) + b) 



bg(ro(0)) < {e~ e + 9-1) ^^—^aE^A:) + b) 
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Let us write C := f i-^il ( a ^g(^0 + b), then by Markov's inequality, we have that for 
0>0>- 1 -^,0<t<Eg(X), 

log(P(/(X) < -t)) < \og(m(6)) + 6t< (e- + 6 - l) C+ 6t 
The minimum of the right hand side is taken at 

, ra „=-,o g ( 1+ i)>-io g (i + ?.^Mi 

which satisfies > 9 m i n > — 1 ~4^ 1 whenever a < a c (l — ||A||i). Thus, in this case we have 
log(P(/(X) < -t)) < (± - log (l + ±Y\ C - log + 1 



C 



-logll + ^lll + l 



Now let us take a look at the x — log(l + x)(l + x) function for positive x, we can easily 
check that this is negative, and 



x — log(l + x)(l + x) < 



x 2 



2+(2/3)z 
so 

log(P(/(X) < -t)) < 



2C + (2/3)* 5(aEg(X) + 6)/(l - ||A||i) + (2/3)*' 

□ 



7. Appendix 

7.1. Proofs of lemmas about d T (x,S) and d^(x,S) 

Proof of Lemma 3.1. The proof is similar to the proof of Proposition 13 of Boucheron, Lugosi and 
(2003). 

Let M.(S) denote the set of probability measures on S. Then, using Sion's minimax the- 
orem (see Sion (1958), and Komiya (1988)), we may rewrite dr as 

n 

d T (x,S)= inf sup V'>,:V0 J ._.. v . (7.1) 

ueM{b) a:|| a || 2 <l 

where Y = (F 1; . . . , Y n ) is distributed according to v. 

Rather than minimizing in the large space Ai(S), we may as well perform minimization 
on the convex compact set of the probability measures on {0,1 } n by mapping y G S on 
(l^^x-jOi^jXn- Denote this mapping by x- Note that the mapping depends on x but we 
omit this dependence to lighten notation. The set Ai(S) o x" 1 of probability measures on 
{0, l} n coincides with Ai(x{S)). It is convex and compact and therefore the infimum in 
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the last display is achieved at some v. Then dj<(X, S) is just the Euclidean norm of the 
vector [Ej > [l x .jLY j ]) 1<; j <n , and therefore the supremum in (7.1) is achieved by the vector a of 
components 

To this end, we may use once again Sion's minimax theorem to write the convex distance 

as 

n 

d T (x,S) = inf sup V" ajE v [l x .+ Y \ 

"eA40S)a:||a|| a <ir^ 
n 

= sup inf Y]a j E v [l x .^Y j ] 

a:\\a\\ 2 <l^M(S)j^ 

Denote the pair (u, a) at which the saddle point is achieved by (z>, a). 
Now let's write 

n n 

dr(y, S) = inf sup ^ajE,^.^] < inf V ^[1^.^]. 

ueM(S) a:||o:||2<l 7 u£M(S) ^— ' 

Let v denote the distribution on S that achieved the infimum in the latter expression. Then 
we have 

n n 

d T (x, S) = inf "AMI ^ Yl ^[Ix^yJ 

UeM(S) 3=1 3=1 

Hence 

n 

d T (x,S) - d T (y,S) < ^2&jE p [l*^ - < 

< dj < 1, and Y^i=i a 2 = 1, so the result follows. □ 

Proof of Lemma 3.2. The second claim is proven in Lemma 1 of Boucheron, Lugosi and Massart 
(2009). 

We can suppose without loss of generality that d^(y, S) < d^(x, S), then 

d%{x, S) - d 2 T (y, S) = (d T (x, S) - d T (y, S))(d T (x, S) + d T (y, S)) 
< (d T (x, S) - d T (y, S))2d T (x, S)< 2 Mx, S)a t , 

where &i is as defined in the proof of Lemma 3.1. If we denote 

ati(x) := 2dx(x, S)a iy 

then 

n 

J2^{x) 2 < Ad 2 T {x,S), 

so the claim follows. □ 
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